Show code
pacman::p_load(igraph,tidygraph,ggraph,visNetwork,ggmap,tmap,ggplot2,tidyverse,graphlayouts,jsonlite,zoo,plotly)Huynh Minh Phuong
Oceanus has enlisted FishEye International’s assistance in identifying potentially illegal fishing companies. FishEye’s analysts were provided with import/export data for Oceanus’ marine and fishing sectors, but the data was incomplete. To aid their analysis, FishEye transformed the trade data into a knowledge graph to understand business relationships, specifically to combat illegal, unreported, and unregulated (IUU) fishing and protect affected marine species. While node-link diagrams provided a high-level overview of the knowledge graph, FishEye now seeks visualizations that offer more detailed patterns about entities within the graph. The analysis consists of two main parts.
Firstly, FishEye aims to visualize temporal patterns to determine if companies engaging in illegal fishing activities have reemerged under different names after shutting down. They seek assistance in comparing the activities of these companies over time.
Secondly, FishEye has employed various tools, including artificial intelligence, to reason on the knowledge graph and propose additional links to expand the dataset. They have presented 12 groups of link suggestions and require aid in evaluating the reliability of these tools for completing the graph. FishEye is particularly interested in identifying new temporal patterns or anomalies that arise when new links are added.
We aim to use visual analytics to help FishEye identify companies that may be engaged in illegal fishing.
The graph is structured in a JSON format designed to align with d3’s node-link format and ensure compatibility with networkx.node_link_graph. At the top level, it consists of a dictionary containing graph-level properties, indicated by keys such as “directed,” “multigraph,” and “graph.” The “nodes” and “links” keys each hold a dictionary of nodes and links, respectively.
For nodes, the entries must include a unique “id” key for each node. As for links, they consist of “source” and “target” keys, referring to the node id values. Any additional keys provided within the node and link dictionaries serve as attributes for those specific nodes or links.
Node Attributes:
id – Name of the company that originated (or received) the shipment
shpcountry – Country the company most often associated with when shipping
rcvcountry – Country the company most often associated with when receiving
dataset – Always ‘MC2’
Edge Attributes:
arrivaldate – Date the shipment arrived at port in YYYY-MM-DD format.
hscode – Harmonized System code for the shipment. Can be joined with the hscodes table to get additional details.
valueofgoods_omu – Customs-declared value of the total shipment, in Oceanus Monetary Units (OMU)
volumeteu – The volume of the shipment in ‘Twenty-foot equivalent units’, roughly how many 20-foot standard containers would be required. (Actual number of containers may have been different as there are 20ft and 40ft standard containers and tankers that do not use containers)
weightkg – The weight of the shipment in kilograms (if known)
dataset – Always ‘MC2’
type – Always ‘shipment’ for MC2
generated_by – Name of the program that generated the edge. (Only found on ‘bundle’ records.)
The main packages for plotting graphs are: igraph, tidygraph, ggraph, visNetwork, sf, sfnetworks. The rest of the packages are for data wrangling.
Use fromJson from package jsonlite to import main graph data from json file format
We will ignore the dataset column and use select to select id, shipping country and receiving country information. Use glimpse to view the data.
Rows: 34,576
Columns: 3
$ id <chr> "AquaDelight Inc and Son's", "BaringoAmerica Marine Ges.m.b…
$ shpcountry <chr> "Polarinda", NA, "Oceanus", NA, "Oceanus", "Kondanovia", NA…
$ rcvcountry <chr> "Oceanus", NA, "Oceanus", NA, "Oceanus", "Utoporiana", NA, …
Use as.factor to convert the character data type to factor.
id shpcountry rcvcountry
-1 : 1 Marebak : 3657 Oceanus :29312
-10 : 1 Alverossia: 600 Coralmarica : 552
-100 : 1 Isliandor : 437 Sol y Oceana: 277
-1000 : 1 Merigrad : 435 Marebak : 151
-10000 : 1 Oceanus : 418 Riodelsol : 99
-10001 : 1 (Other) : 6670 (Other) : 1276
(Other):34570 NA's :22359 NA's : 2909
We observed many NA in the shpcountry and rcvcountry columns. Since we are unable to infer the values for these columns, we need to remove rows with NA values.
We use group_by to check for each company id how many different shipcountry and rcvcounty are available to identify the overall trend. Each company only ship to 1 receiving country and ship from 1 shipping country.
id no_shpcountry no_rcvcountry
-1 : 1 Min. :1 Min. :1
-10 : 1 1st Qu.:1 1st Qu.:1
-100 : 1 Median :1 Median :1
-1000 : 1 Mean :1 Mean :1
-10000 : 1 3rd Qu.:1 3rd Qu.:1
-10001 : 1 Max. :1 Max. :1
(Other):34570
We only considered edges information of nodes in MC2_nodes and used filter to filter out the edges for those nodes. We ignored the dataset and type column and used select to select the rest of the attributes of edges of the main graph. Used glimpse to view the data.
Rows: 5,309,087
Columns: 8
$ source <chr> "AquaDelight Inc and Son's", "AquaDelight Inc and Son…
$ target <chr> "BaringoAmerica Marine Ges.m.b.H.", "BaringoAmerica M…
$ arrivaldate <chr> "2034-02-12", "2034-03-13", "2028-02-07", "2028-02-23…
$ hscode <chr> "630630", "630630", "470710", "470710", "470710", "47…
$ weightkg <int> 4780, 6125, 10855, 11250, 11165, 11290, 9000, 19490, …
$ volumeteu <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,…
$ valueofgoodsusd <dbl> NA, NA, NA, NA, NA, NA, 87110, 188140, NA, 221110, 58…
$ valueofgoods_omu <dbl> 141015, 141015, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
Wrong data types for arrivaldate as well as other character type data need to be converted.
We used as.factor to convert source, target, hscode. mutate and ymd were used to convert arrival date data type to date and extract day of the week information from the date with wday
MC2_edges<-MC2_edges %>%
mutate(arrivaldate = ymd(arrivaldate)) %>%
mutate(weekday = wday(arrivaldate,
label = TRUE,
abbr = FALSE)) %>%
mutate(monthyear=as.yearmon(arrivaldate)) %>%
mutate(year=year(arrivaldate))%>%
filter(source!=target)
MC2_edges$source<-as.factor(MC2_edges$source)
MC2_edges$target<-as.factor(MC2_edges$target)
glimpse(MC2_edges)Rows: 5,158,933
Columns: 11
$ source <fct> AquaDelight Inc and Son's, AquaDelight Inc and Son's,…
$ target <fct> BaringoAmerica Marine Ges.m.b.H., BaringoAmerica Mari…
$ arrivaldate <date> 2034-02-12, 2034-03-13, 2028-02-07, 2028-02-23, 2028…
$ hscode <chr> "630630", "630630", "470710", "470710", "470710", "47…
$ weightkg <int> 4780, 6125, 10855, 11250, 11165, 11290, 9000, 19490, …
$ volumeteu <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,…
$ valueofgoodsusd <dbl> NA, NA, NA, NA, NA, NA, 87110, 188140, NA, 221110, 58…
$ valueofgoods_omu <dbl> 141015, 141015, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
$ weekday <ord> Sunday, Monday, Monday, Wednesday, Monday, Monday, We…
$ monthyear <yearmon> Feb 2034, Mar 2034, Feb 2028, Feb 2028, Sep 2028,…
$ year <dbl> 2034, 2034, 2028, 2028, 2028, 2028, 2028, 2028, 2028,…
summary was used to get a sense of data distribution. Since volumeteu, valuesofgoods_usd and valueofgoods_omu mostly contain 0 or NA values, we can disregard these columns in the analysis. We also observed that Friday seems to have the highest number of shipments while Wednesday has the lowest number of shipments.
source
Fresh Limited Liability Company : 152521
Uttar Pradesh s CJSC : 116151
bǐ mù yú Sagl Distribution : 89207
Coastal Cruisers Pic Shipping : 82010
Chhattisgarh Marine ecology A/S Delivery: 81212
Mar y Luna Sagl : 76853
(Other) :4560979
target arrivaldate
hǎi dǎn Corporation Wharf : 448746 Min. :2028-01-01
Caracola del Sol Services : 190708 1st Qu.:2029-09-16
Costa de la Felicidad Shipping : 160527 Median :2031-05-06
Panope Limited Liability Company: 158081 Mean :2031-06-03
Saltwater Supreme ОАО Forwading : 157800 3rd Qu.:2033-02-26
Mar del Este CJSC : 126180 Max. :2034-12-30
(Other) :3916891
hscode weightkg volumeteu valueofgoodsusd
Length:5158933 Min. : 0 Min. : 0.0 Min. :0.000e+00
Class :character 1st Qu.: 3475 1st Qu.: 0.0 1st Qu.:2.843e+04
Mode :character Median : 11070 Median : 0.0 Median :7.385e+04
Mean : 38794 Mean : 1.5 Mean :8.716e+05
3rd Qu.: 19960 3rd Qu.: 0.0 3rd Qu.:1.601e+05
Max. :495492485 Max. :1215.0 Max. :2.258e+11
NA's :412222 NA's :2779774
valueofgoods_omu weekday monthyear year
Min. : 1100 Sunday :788160 Min. :2028 Min. :2028
1st Qu.: 148130 Monday :745243 1st Qu.:2030 1st Qu.:2029
Median : 504485 Tuesday :675274 Median :2031 Median :2031
Mean : 1665142 Wednesday:601902 Mean :2031 Mean :2031
3rd Qu.: 1202560 Thursday :691915 3rd Qu.:2033 3rd Qu.:2033
Max. :44744530 Friday :866322 Max. :2035 Max. :2034
NA's :5158652 Saturday :790117
We use n_distinct() to check the number of hscode available in the dataset. There are many hscode values that are not relevant to our project. For example, hscode 470710 is for shipment of paper or paperboard. hscode that are relevant to our fish data will include only those that starts with:
“301”: Live Fish
“302: Fresh or chilled fish exclude fish fillets and fish meat in 304
“303”: Frozen fish
“304”: Fish fillets and fish meat (fresh, chilled or frozen)
“305”: Fish, dried, salted or in brine
“1604”: Extracts and juices of meat, fish or crustaceans molluscs or other aquatic invertebrates
“1605”: Prepared or preserved fish
we use substr() to extract the first 3 digits from hscode and then filter rows with only the hscode that starts with 301, 302, 303, 304, 305. We also extract the first 4 digits from hscode and filter those with 1604 and 1605.
MC2_edges_filtered<-MC2_edges %>%
mutate(sub_hscode3=substr(hscode,1,3)) %>%
mutate(sub_hscode4=substr(hscode,1,4)) %>%
filter(sub_hscode3 %in% c("301","302","303","304","305")|sub_hscode4 %in% c("1604","1605"))
MC2_edges_filtered$hscode<-as.factor(MC2_edges_filtered$hscode)
summary(MC2_edges_filtered$hscode) 304620 160414 304610 160521 304710 304750 304810 160510 160529 304870
87327 81628 36540 30238 27165 23911 21418 20530 14274 11508
303890 304720 304890 304990 160540 160420 303230 304740 303660 160413
10984 9714 9597 9444 9199 8977 8612 8505 7370 7148
303120 303130 160553 304790 304830 160419 304820 305320 160411 303630
6972 6690 4758 4530 3720 3007 2675 2192 2111 1960
303110 160556 305420 303310 304940 160415 303540 305410 160416 160551
1883 1871 1785 1634 1588 1446 1385 1248 1246 1225
160554 303290 303640 304390 305510 303420 303510 303140 160417 303530
1175 1072 1049 751 751 730 721 700 690 615
305630 305590 303690 303390 303330 304840 303670 303910 160559 304910
612 592 588 410 396 384 383 363 357 355
303590 303240 303830 304850 160552 304950 160555 160412 160558 305200
324 317 316 280 260 236 218 201 176 150
160432 303810 304690 303410 303430 304420 305490 301190 304490 303550
145 140 137 128 127 117 112 111 107 105
303570 160530 305100 305530 303250 305620 305690 302460 303820 305540
100 88 86 79 78 78 78 74 68 63
301110 303840 160557 303190 304410 304460 304320 305790 305390 (Other)
62 62 59 58 54 50 48 48 43 807
Top 10% of the source with the highest number of shipments account for 80% of the total shipments
Top_sources_tw<-MC2_edges_filtered %>%
group_by(source) %>%
summarise(weight_total=n()) %>%
arrange(weight_total)%>%
mutate(cm_weight_pct=100*cumsum(weight_total)/sum(weight_total)) %>%
mutate(rec=1,
cm_pop_pct=100*cumsum(rec)/sum(rec))
Top_sources_tw %>%
ggplot(aes(cm_pop_pct,cm_weight_pct))+
geom_line()+
scale_x_continuous(name="Percent of population", limits=c(0,100), breaks=c(0,20,40,60,80,100))+
scale_y_continuous(name="Cumulative weight percentage", limits=c(0,100), breaks=c(0,20,40,60,80,100))+
ggtitle("Cumulative Percentage of Number of Shipments vs Percent of Population")+
theme_minimal()
Top 20% of the top sources by distinct number of target ship to 70% of the targets
Top_sources_dist<-MC2_edges_filtered %>%
group_by(source) %>%
summarise(weight_distinct=n_distinct(target)) %>%
arrange(weight_distinct)%>%
mutate(cm_weight_pct=100*cumsum(weight_distinct)/sum(weight_distinct)) %>%
mutate(rec=1,
cm_pop_pct=100*cumsum(rec)/sum(rec))
Top_sources_dist %>%
ggplot(aes(cm_pop_pct,cm_weight_pct))+
geom_line()+
scale_x_continuous(name="Percent of population", limits=c(0,100), breaks=c(0,20,40,60,80,100))+
scale_y_continuous(name="Cumulative distinct weight percentage", limits=c(0,100), breaks=c(0,20,40,60,80,100))+
ggtitle("Cumulative Percentage of Number of Distinct Targets vs Percent of Population")+
theme_minimal()
We filter the edge information to contain only the top sources that have not only highest number of total connections but also highest number of unique connections. We extracted 284 top sources based on this method.
Similar analysis is performed to identify the top targets that account for the majority of the weight of the graph as well as have the highest distinct number of sources.
Top_targets_tw<-MC2_edges_filtered %>%
group_by(target) %>%
summarise(weight_total=n()) %>%
arrange(weight_total)%>%
mutate(cm_weight_pct=100*cumsum(weight_total)/sum(weight_total)) %>% mutate(rec=1,
cm_pop_pct=100*cumsum(rec)/sum(rec))
Top_targets_tw %>%
ggplot(aes(cm_pop_pct,cm_weight_pct))+
geom_line()+
scale_x_continuous(name="Percent of population", limits=c(0,100), breaks=c(0,20,40,60,80,100))+
scale_y_continuous(name="Cumulative weight percentage", limits=c(0,100), breaks=c(0,20,40,60,80,100))+
ggtitle("Cumulative Percentage of Number of Shipments vs Percent of Population")+
theme_minimal()
Top_targets_dist<-MC2_edges_filtered %>%
group_by(target) %>%
summarise(weight_distinct=n_distinct(source)) %>%
arrange(weight_distinct)%>%
mutate(cm_weight_pct=100*cumsum(weight_distinct)/sum(weight_distinct)) %>%
mutate(rec=1,
cm_pop_pct=100*cumsum(rec)/sum(rec))
Top_targets_dist %>%
ggplot(aes(cm_pop_pct,cm_weight_pct))+
geom_line()+
scale_x_continuous(name="Percent of population", limits=c(0,100), breaks=c(0,20,40,60,80,100))+
scale_y_continuous(name="Cumulative distinct weight percentage", limits=c(0,100), breaks=c(0,20,40,60,80,100))+
ggtitle("Cumulative Percentage of Number of Distinct Targets vs Percent of Population")+
theme_minimal()
We aggregate the weight of the filtered edges as well as the total weight in kg of the shipments based on source, target, hscode, weekday and monthyear.We will use this data for temporal analysis
source
Sea Breezes S.A. de C.V. Freight : 2753
Estrella de la Costa SRL : 1813
OceanSavor Surf Limited Liability Company Logistics: 1677
nián yú Ltd. Corporation : 1577
Shou gan Oyj Overseas : 1420
Kerala Market - Ges.m.b.H. Cargo : 1291
(Other) :85459
target hscode weekday
Mar del Este CJSC : 9299 304610 :16822 Sunday :14821
hǎi dǎn Corporation Wharf : 7794 160414 : 9302 Monday :13608
Caracola del Sol Services : 4536 304620 : 7872 Tuesday :13694
Pao gan SE Seal : 3731 304810 : 7761 Wednesday:10803
Costa de la Felicidad Shipping: 3686 303230 : 4798 Thursday :12317
Olas del Sur Ltd : 3230 304710 : 4747 Friday :15981
(Other) :63714 (Other):44688 Saturday :14766
monthyear year weight weightkg
Min. :2028 Min. :2028 Min. : 1.000 Min. : 0
1st Qu.:2030 1st Qu.:2029 1st Qu.: 1.000 1st Qu.: 21390
Median :2032 Median :2031 Median : 2.000 Median : 37220
Mean :2032 Mean :2031 Mean : 2.798 Mean : 63838
3rd Qu.:2033 3rd Qu.:2033 3rd Qu.: 2.000 3rd Qu.: 51960
Max. :2035 Max. :2034 Max. :160.000 Max. :3293520
To plot the overall network graph, we will further aggregate this data by source, target and year. As hscode 304620 has the highest count, we will also look at the graph in with this hscode.
We can see that the number of edges for nodes at the boundary of the graph decreases from 2028 to 2034. Since our graph is directed, this means that there are fewer shipments between the weakly connected nodes. The nodes with high degree centrality for this type of shipments do not change significantly from 2028 to 2034.
# |fig-width: 100
# |fig-height: 100
MC2_graph %>%
mutate(centrality = centrality_authority()) %>%
ggraph(layout = "fr")+
geom_edge_link(aes(width=weight),
alpha=0.2)+
scale_edge_width(range = c(0.1, 5))+
geom_node_point(aes(size = centrality, colour = centrality))+
theme_graph()+
scale_color_continuous(guide = 'legend') +
theme_graph()+
facet_edges(~year)+
th_foreground(foreground = "grey80",
border = TRUE) +
theme(legend.position = 'bottom')
Next we will use visnetwork to create an interactive graph and zoom into the high in degree centrality nodes. We can identify a few companies that have the highest degree centrality: Caracola del Sol Services, hǎi dǎn Corporation Wharf, and Mar del Este CJSC
We change the arrows option to from to identify the sources with high outdegree centrality. There are more companies with high out degree centrality such as Sea Breezes S.A. de C.V. Freight, Arena del Sol Pic.
It is noteworthy that the high indegree source Caracola del Sol Services identified in the previous link also have a high outdegree centrality. We can take a closer look at the network and the temporal trend of this company.
We used group_louvain() which implements the multi-level modularity optimization algorithm for finding community structure. It is based on the modularity measure and a hierarchical approach. Initially, each vertex is assigned to a community on its own. In every step, vertices are re-assigned to communities in a local, greedy way: each vertex is moved to the community with which it achieves the highest contribution to modularity. When no vertices can be reassigned, each community is considered a vertex on its own, and the process starts again with the merged communities. The process stops when there is only a single vertex left or when the modularity cannot be increased any more in a step.
We see a dip in the overall trend of the shipment volume from 2028 to 2031 and then an increase from 2031 to 2034
overall_trend <-MC2_edges_filtered%>%
group_by(monthyear) %>%
summarise(weight=n(),
weightkg=mean(weightkg)) %>%
arrange(monthyear) %>%
ungroup()
t<-overall_trend %>%
ggplot(aes(x = monthyear, y=weightkg))+
geom_point()+
geom_line()+
theme(legend.position="none")+
theme_minimal()+
ggtitle('Overall trend in the monthly average shipment volume in kg over time')+
scale_y_continuous(name="Average shipment volume in kg")+
scale_x_continuous(name="Month Year", breaks=c(2028,2029,2030,2031,2032,2033,2034,2035))+
theme_minimal()
ggplotly(t,
tooltip=c("x","y"))We observed the same trend the the monthly total number of shipments over time.
t2<-overall_trend %>%
ggplot(aes(x = monthyear, y=weight))+
geom_point()+
geom_line()+
theme(legend.position="none")+
theme_minimal()+
ggtitle('Overall trend in the monthly total number of shipments over time')+
scale_y_continuous(name="No of shipments")+
scale_x_continuous(name="Month Year", breaks=c(2028,2029,2030,2031,2032,2033,2034,2035))+
theme_minimal()
ggplotly(t,
tooltip=c("x","y"))Now we will take a closer look at the temporal shipment pattern of one of key node in shipments of hscode ‘304620’
key_node_28 <-MC2_edges_filtered%>%
filter(monthyear<'Jan 2029'&
(source=='Caracola del Sol Services' | target=='Caracola del Sol Services')) %>%
mutate(monthyear=as.factor(monthyear)) %>%
group_by(weekday,monthyear) %>%
summarise(weight=n(),
weightkg=mean(weightkg)) %>%
arrange(weekday,monthyear) %>%
ungroup()
key_node_28 %>%
ggplot(aes(monthyear,weekday, fill=weight))+
geom_tile(colour='white',
size=0.1)+
scale_fill_gradient(name = "No of shipments",
low = "sky blue",
high = "dark blue") +
theme_minimal()+
labs(x = NULL,
y = NULL,
title = "No of shipments by month year and weekday in 2028") +
theme(axis.text.x=element_text(angle=30, vjust=0.5),
plot.title = element_text(hjust = 0.5),
legend.title = element_text(size = 8),
legend.text = element_text(size = 6) )
key_node_34 <-MC2_edges_filtered%>%
filter(monthyear>='Jan 2034'&
(source=='Caracola del Sol Services' | target=='Caracola del Sol Services')) %>%
mutate(monthyear=as.factor(monthyear)) %>%
group_by(weekday,monthyear) %>%
summarise(weight=n(),
weightkg=mean(weightkg)) %>%
arrange(weekday,monthyear) %>%
ungroup()
key_node_34 %>%
ggplot(aes(monthyear,weekday, fill=weight))+
geom_tile(colour='white',
size=0.1)+
scale_fill_gradient(name = "No of shipments",
low = "sky blue",
high = "dark blue") +
theme_minimal()+
labs(x = NULL,
y = NULL,
title = "No of shipments by month year and weekday in 2034") +
theme(axis.text.x=element_text(angle=30, vjust=0.5),
plot.title = element_text(hjust = 0.5),
legend.title = element_text(size = 8),
legend.text = element_text(size = 6) )
We will plot the percentage change in the monthly total number of shipments over time by individual source to identify any abnormal increase or decrease in the number of shipments. From the graph, we can get the name of the company from the tooltip function.
st<-ggplot(source_trend, aes(x = monthyear, y = weight_pct_change,
colour=source))+
geom_point()+
geom_line()+
ggtitle('Percentage change of the monthly total number of shipments over time shipped by each source')+
scale_y_continuous(name="Percentage change of number of shipments")+
scale_x_continuous(name="Month Year", breaks=c(2028,2029,2030,2031,2032,2033,2034,2035))+
theme_minimal()+
theme(legend.position="none")
ggplotly(st,
tooltip=c('colour',
"x",
"y"))We will plot the percentage change in the monthly total volume of shipments over time by individual source to identify any abnormal increase or decrease in the volume of shipments. From the graph, we can identify these companies with the tooltip function.
stkg<-ggplot(source_trend, aes(x = monthyear, y = weightkg_pct_change,
colour=source))+
geom_point()+
geom_line()+
ggtitle('Percentage change of volume in kg of shipments over time shipped by each source')+
scale_y_continuous(name="Percentage change of volume in kg of shipments")+
scale_x_continuous(name="Month Year", breaks=c(2028,2029,2030,2031,2032,2033,2034,2035))+
theme_minimal()+
theme(legend.position="none")
ggplotly(stkg,
tooltip=c('colour',
"x",
"y"))We will perform the same analysis for the targets.
twkg<-ggplot(target_trend, aes(x = monthyear, y = weightkg_pct_change,
colour=target))+
geom_point()+
geom_line()+
ggtitle('Percentage change of volume in kg of shipments over time received by each target')+
scale_y_continuous(name="Percentage change of volume in kg of shipments")+
scale_x_continuous(name="Month Year", breaks=c(2028,2029,2030,2031,2032,2033,2034,2035))+
theme_minimal()+
theme(legend.position="none")
ggplotly(twkg,
tooltip=c('colour',
"x",
"y"))tw<-ggplot(target_trend, aes(x = monthyear, y = weight_pct_change,
colour=target))+
geom_point()+
geom_line()+
ggtitle('Percentage change of total number of shipments over time received by each target')+
scale_y_continuous(name="Percentage change of number of shipments")+
scale_x_continuous(name="Month Year", breaks=c(2028,2029,2030,2031,2032,2033,2034,2035))+
theme_minimal()+
theme(legend.position="none")
ggplotly(tw,
tooltip=c('colour',
"x",
"y"))